Neural Modeling for Named Entities and Morphology (NEMO2)

نویسندگان

چکیده

Named Entity Recognition (NER) is a fundamental NLP task, commonly formulated as classification over sequence of tokens. Morphologically-Rich Languages (MRLs) pose challenge to this basic formulation, the boundaries Entities do not necessarily coincide with token boundaries, rather, they respect morphological boundaries. To address NER in MRLs we then need answer two questions, namely, what are units be labeled, and how can these detected classified realistic settings, i.e., where no gold morphology available. We empirically investigate questions on novel benchmark, parallel tokenlevel morpheme-level annotations, which develop for Modern Hebrew, morphologically rich-and-ambiguous language. Our results show that explicitly modeling leads improved performance, hybrid architecture, precedes prunes decomposition, greatly outperforms standard pipeline, decomposition strictly NER, setting new performance bar both Hebrew tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Separating Named Entities

In this paper, we analyze the situation of long sequences of mostly capitalized words which look like a named entity but in fact they consist of several named entities. An example of such phenomena is hokejista (hockey player) New York Rangers Jaromír Jágr. Without splitting the sequence correctly, we will wrongly assume that the whole capitalized sequence is a name of the hockey player. To fin...

متن کامل

Named entities from Wikipedia for machine translation

In this paper we present our attempt to improve machine translation of named entities by using Wikipedia. We recognize named entities based on categories of English Wikipedia articles, extract their potential translations from corresponding Czech articles and incorporate them into a statistical machine translation system as translation options. Our results show a decrease of translation quality...

متن کامل

PAYMA: A Tagged Corpus of Persian Named Entities

The goal in the named entity recognition task is to classify proper nouns of a piece of text into classes such as person, location, and organization. Named entity recognition is an important preprocessing step in many natural language processing tasks such as question-answering and summarization. Although many research studies have been conducted in this area in English and the state-of-the-art...

متن کامل

A Priority Model for Named Entities

We introduce a new approach to named entity classification which we term a Priority Model. We also describe the construction of a semantic database called SemCat consisting of a large number of semantically categorized names relevant to biomedicine. We used SemCat as training data to investigate name classification techniques. We generated a statistical language model and probabilistic contextf...

متن کامل

Annotation of Chemical Named Entities

We describe the annotation of chemical named entities in scientific text. A set of annotation guidelines defines 5 types of named entities, and provides instructions for the resolution of special cases. A corpus of fulltext chemistry papers was annotated, with an inter-annotator agreement score of 93%. An investigation of named entity recognition using LingPipe suggests that scores of 63% are p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2021

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00404